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Finite sample properties of multiple imputation estimators under 
the linear regression model are studied. The exact bias of the multiple 
imputation variance estimator is presented. A method of reducing the 
bias is presented and simulation is used to make comparisons. We also 

pH , show that the suggested method can be used for a general class of 

^0 ' linear estimators. 

j^ , 1. Introduction. Multiple imputation, proposed by Rubin (1978), is a 

procedure for handling missing data that allows the data analyst to use 
standard techniques of analysis designed for complete data, while providing 
a method to estimate the uncertainty due to the missing data. Repeated im- 
^ ' putations are drawn from the posterior predictive distribution of the missing 

cn . values under the specified model given a suitable prior distribution. 

^jj I Schenker and Welsh [(1988), hereafter SW] studied the asymptotic prop- 

\^ • erties of multiple imputation in the linear-model framework, where the scalar 

^^ , outcome variable Yi is assumed to follow the model 

O ■ Yi = x'i/3 + e, 

::a: (1) 

-(— > 



e 



X 



e,'-^^-iV(0,a2 



The p-dimensional Xj's are observed on the complete sample and are as- 

^ I sumed to be fixed. 

To describe the imputation procedure, we adopt matrix notation. With- 
out loss of generality, we assume that the first r units are the respon- 

C^ I dents. Let y,. = (Yi, 12, ■ ■ ■ , Yr)' and Xr = (xi, X2, . . . , x^)'. Also, let yn-r = 

{Yr+i,Yr+2, ■ ■ ■,yny and Xn-r = (xr+i,x.r+2, . . • ,x„)'. The Suggested method 
of multiple imputation for model (1) is as follows: 
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[Ml] For each repetition of the imputation, k = l, . . . , M, draw 

z.'2 /,,2 



*2 I i-i-d- / \ "Z / z 

0"(fc) \yr ~ {r-p)crr/Xr~p, 



(2) 

where a^ = (r -p)-V;[/ - XriX^Xr)~'X',]yr. 
[M2] Draw 



(3) 



i.i.d. 



'1_*2 



/3(fc) I (yr,^(fc)) '^ N{^,, (x;,x,)-V^,)j 



where ^^ = {X^.Xr)~^X'^yr. 
[M3] For each missing unit j = r + 1, . . . ,n draw 



(4) 



^■^■iv(o,a;^ 



*2 ^ 



ei{fc) I lP(fc)>o-(fc)J 

Then ^^*^fc) = ^jf^J^) + e*?;.) is the imputed value associated with unit j 
for the fcth imputation. 
[M4] Repeat [M1]-[M3] independently M times. 

The above procedure assumes a constant prior for {(3, logcr) and an ignorable 
response mechanism in the sense of Rubin (1976). 

At each repetition of the imputation, k = 1,. . . ,M, we can calculate the 
imputed version of the full sample estimators 
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I{k),n ■ 



^^iA 



A=l 



i=r-\-l 



i(k) 



1=1 



and 



Vi 



I{k),n 



2^XjXj I ^l(k),n^ 



, i=l 



where 

' r n 

.1=1 i=r+l 

The proposed point estimator for the regression coefficient based on M re- 
peated imputations is 

M 
k=l 

The proposed estimator for the variance of the point estimator (5) is 

(6) VM,n = WM,n + {l+M~^)BM,n, 

where 

M 

(7) WM,n = M-' Y. ^m,n 

fc=l 
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and 

M 

(8) BM,n = {M -1)" ^{Pli^k),n-(^M,n){^I{k),n-(^M,n)'- 

k=l 

Rubin (1987) called WM,n the within-imputation variance and called i?Af,n 
the between-imputation variance. We call VM,n Rubin^s variance estimator. 
SW studied the asymptotic properties of the point estimator (5) and its 
variance estimator (6). Under regularity conditions they showed that 

(9) limE0^j^^-P)=O 
and 

(10) lim n{E(yM,n) - Var(^^^„)} = 0, 

where the reference distribution in (9) and (10) is the regression model (1) 
with an ignorable response mechanism. 

Note that (9) and (10) require that the sample size n go to infinity, for 
fixed M, M > 1. Finite sample properties are not discussed by SW. The next 
section gives finite sample properties of the multiple imputation estimators. 
In Section 3 a simple modified version of the SW method is proposed to min- 
imize the finite sample bias of the multiple imputation variance estimator. 
In Section 4 extensions are made to a more general class of estimators. In 
Section 5 results of a simulation study are reported. In Section 6 concluding 
remarks are made. 

2. Finite sample properties. The following lemma provides the covari- 
ance structure of the multiply-imputed data set generated by [M1]-[M4]. 

Lemma 2.1. Let Yi be the observed value of the ith unit, i = 1,2,. . . ,r 
(r > p + 2), and let Y*,*j^-, be the imputed value associated with the jth unit for 
the kth repetition of the multiple imputation generated by the steps [Ml]- 
[M4]. Then, under model (1) with an ignorable response mechanism, 

(11) Cov(y„ K- ) = xKx;x,)-ix,a2 



and 



** 



Cov{Y;C,^,Yj^,)) 



fl2) f (1 + A)x-(X^Xr) ^Xj(j^ -I- Acr^, ifi = jandk = s, 

= < {1 + X)x'^{Xl.Xr)~^Xja'^, ifii^j andk = s, 

[x'i{Xl.Xr)-^Xja^, ifk^s, 

where A = {r—p — 2)~^{r—p) and the expectations in (11) and (12) are taken 
over the joint distribution of model (1) and the imputation mechanism with 
the indices of respondents fixed. 
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For the proof, see Appendix A. 

Note that the imputed value Y^t^ can be decomposed into three indepen- 
dent components as 

(13) Y*^l) = Xi^, + x,(/3^,) - ^,) + e*l,y 

The first component Xj^^ has mean Xj/3 and variance x^(X^Xf.)~^XjO"'^, the 
second component has mean zero and variance Ax^(X^Xr)~^Xj(T^ and the 
third component has mean zero and variance \a^. 

The following theorem gives the mean and variance of the point estima- 
tor f3]yj „ of the regression coefficient and the mean of the multiple imputa- 
tion variance estimator VM,n- Again, the expectations in the following the- 
orem are taken over the joint distribution of model (1) and the imputation 
mechanism with the indices of respondents fixed. 

Theorem 2.1. Under the assumptions of Lemma 2.1, 

(14) E0,,^J=P, 

(15) Var(;3j,,^„) = {X'^Xr-y^a^ + M~^ X[{X',Xr)-' - (^X^)" V', 

(16) E{WM,n) = (X;X„)-i{l + {n-pr\X -l){n- r)}a^ 
and 

(17) E{BM,n) = A[(X;X,)-i - «X„)- V', 

where A = {r — p — 2)~^ {r — p) , WM,n is the within-imputation variability 
defined in (7) and Bm^u is the between-imputation variability defined in (8) . 
The bias of the multiple imputation variance estimator is 

n — r)a 



E{VM,n) - Var(^jv/,n) = {X'^X^)-\n-p)-\\ - 1)( 

+ (A-i)[(x;x,)-i-(x;x„)-v 



For the proof, see Appendix B. 

As is observed from (15), the point estimator /fl^/n fo^ infinite M achieves 
the same efficiency of [3^., the estimator based on the respondents. In fact, 

lim ^PM,n 



M 

fc=i 
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because J2^=i ^i{yi~'^iPr) = by standard regression theory, limA/-+oo M"^ x 
J2k=iif^*(k) ~ Pr) = by the law of large numbers and \\niM-^ooM~^ x 
X]fc=i e.*?f.\ = by the law of large numbers. By (15) the variance of (3^,71 
can be written as 

(19) Var(^jv/,n) = Var(/3,,) + M-U[Var(^,) - Var(^,J]. 

The second part in the right-hand side of (19) is the increase in variance 
due to using j3]\^^ instead of (3^.. By (17) that increase can be unbiasedly 
estimated by M~^BM,n- Thus, an alternative estimator for the total variance 
of /3^/,„ that is unbiased for (15) is 

(20) Va.iiP,) + M-^BM,n, 

where Var(/3,.) is the standard variance estimator that treats the respondents 
as if they are the original sample. 

In large samples, A = 1 and the bias of the multiple imputation vari- 
ance estimator for the imputed regression coefficient is negligible. The bias 
term (18) is an exact bias for r > p + 2. 

The total variance of /3^ „ can be decomposed into three parts: 

Var(^jv/,n) = Var(^„) + Var(^, - ^„) + Var(^Af,n " ^r)- 

The first part, the second part and the third part can be called the sampling 
variance, the variance due to missingness and the variance due to imputa- 
tion, respectively. Rubin's multiple imputation uses WM,n to estimate the 
sampling variance, BM,n to estimate the variance due to missingness and 
M~^BM,n to estimate the variance due to M repeated imputations. The 
first term on the right-hand side of (18) is the bias of WM,n as an estimator 
of the sampling variance and the second term on the right-hand side of (18) 
is the bias of BM,n as an estimator of the variance due to missingness. The 
imputation variance is unbiasedly estimated by M~^BM,n- 

3. A modification. We consider ways of reducing the bias of the multiple 
imputation variance estimator. Recall that multiple imputation is charac- 
terized by the method of generating the imputed values and by the variance 
formula. The variance formula directly uses the complete sample variance 
estimator so that it can be implemented easily using existing software. 

One estimator of variance is the alternative variance estimator in (20), 
which involves direct estimation of parameters from the respondents. A sim- 
ilar idea was used by Wang and Robins (1998). Then Rubin's variance esti- 
mator VM,n in (6) is no longer required. 

If we want to use Rubin's variance formula, one approach is to modify 
the imputation method to minimize the bias term in (18). To find a best 
imputation procedure, instead of fixing a constant prior for logcr, we use a 
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class of prior distributions indexed by hyperparameters to express a class of 
imputation methods. As a conjugate prior distribution for a^, we choose a 
scaled inverse chi-square with degrees of freedom uq and scale parameter ctq 
as the hyperparameters; that is, the prior distribution of a"^ is the distri- 
bution of J^c^o/xL- III the modified imputation method we determine the 
values of the hyperparameters, vq and CTq, to remove the bias of the multiple 
imputation variance estimator. 

Using the hyperparameters, the posterior distribution for a^ is written 

(21) o-g I Yr '~ ■ [i^OCTq + (r - p)a'^]/x(uo+r-p)^ 
so that 

(22) £(a(t)) = {uo + r-p-2r\i^oal + {r-p)a^). 

Note that SW used z^o = 0- Using the arguments of the proofs of Lemma 2.1 
and Theorem 2.1, the bias of the multiple imputation variance estimator 
based on the posterior distribution in (21) is 

B[as{VM,n) = iX'^Xn)-\n-p)~^{{Xo-l)(n-r)a^ + Xi{n-r)a^} 
+ {{X^Xr)-^ - (X;X„)-n{(Ao - ly + Aiag}, 

where Aq = {i^o + r — p — 2)~^ (r — p) and Ai = {vQ + r —p — 2)~^i^Q. The bias is 
zero when 1^0 = 2 and a^ = 0, which is equivalent to generating the posterior 
values of o"^ from the inverse chi-square distribution with degrees of freedom 
u = r — p + 2 instead of z^ = ? — p in (2). Thus, the choice oii' = r — p + 2 
in (2) makes Rubin's variance estimator unbiased for a finite sample. 

We can also derive the optimal prior using the recent work of Meng and Zaslavsky 
(2002) on single observation unbiased priors (SOUP). Meng and Zaslavsky 
[2002, Section 6] showed that 

(24) 7r(cj2) oc (a2)-2 

is the unique SOUP among all continuously relatively scale invariant priors 
for the scale family distribution. Note that the prior density of the scaled 
inverse chi-square distribution can be written as 

(25) Tr{a^) oc (a2)-K/2+i) exp[-z.o^oV(2^')]- 

Thus, the unique SOUP for a^ in (24) corresponds to the scaled inverse 
chi-square distribution with f o = 2 and Uq = 0, which makes the posterior 
mean in (22) unbiased for o"^. 
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4. Extensions. In this section we investigate the properties of the mod- 
ified method when it is apphed to estimators other than the regression co- 
efficients. Let the complete sample point estimator be a linear estimator of 
the form 

n 

(26) en = Y.a,Yi 

i=l 

for some known coefficients a^. Also, let the complete sample estimator for 
the variance of On be a quadratic function of the sample values of the form 

n n 

(27) K = 5]^J7i,y,y,- 

for some known coefficients Oj^. 

For the VM,n to be asymptotically unbiased for the variance of 9M,n, we 
need the "congeniality" assumption as defined in Meng (1994). The conge- 
niality assumption in our context implies 

(28) Var(^oo,n) = Var(4) + Var(^oo,n - 0^). 

For the case of On = /9„ discussed in Section 2, congeniality holds because 
Ooo,n = Pr and 

Var(^, - ^„) = {XlXrY^a^ - {X'^XnT^a^ 

= Var(^,)-Var(^J. 

We restrict our attention to the case of congenial multiple imputation esti- 
mation because otherwise the variance estimator will be biased even asymp- 
totically. 

The following theorem expresses the bias of Rubin's variance estimator 
applied to the general class of estimators On in (26) and Vn in (27) under 
the SW method. 

Theorem 4.1. Let the assumptions of Lemma 2.1 hold. Assum,e that the 
com,plete sam,ple variance estimator Vn in (27) is unbiased for the variance 
of the complete sample point estimator On in (26) under model (1). Let 
the congeniality assumption (28) hold. Then the multiple imputation point 
estimator 0M,n is unbiased with variance 

{r r n n n '\ 

i=l i=l j=r+l i=r+lj=r+l ) 

(29) 

( n n n \ 

\ i=r+l j=r+l i=r+l ) 
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The bias of VM,n ds an estimator ofVai{9M,n) is 

E{VM,n)-yai{dM,n) 

{r n n n 

2E E ^i,h^j + {l + \) E E ^ijhij 
i=l j=r+l i=r+lj=r+l 

(30) „ > 

+ (A-1) E ^n 

j=r+l i 



^2 



>o 



2 



+ (^-1)] E "i+ E E "i«i^»i 
where X = {r — p — 2)~^(r — p) and hij = x^(X^Xr)~^Xj. 

For the proof, see Appendix C. 

The first term on the right-hand side of (30) is the bias of WM,n as an 
estimator of Var(0„) and the second term is the bias of (1 + M~^)BM,n as 
an estimator of Var(&M,n — dn)- 

Note that the bias term in (30) can be written 

n n 

(31) Bias(yM,n) = 2E E %/iii'^' + (A-l)t/cT2, 
where 

{n n n "\ 

E E (^*j + «i«i)^ii + E (^3i + "i) 
i=r+lj=r+l j=r+l ) 

= trace {{Qn-r + OCa-ra!^_^){Xn-r[X[XrY^^'n~T + ^n-rW 

with Q^n-r the lower-right (n — r) x (n — r) partition of 17 = [0,ij] and ctn-r = 
(or+i, . . . , ttn)'- By the nonnegative definiteness of 0,n-r, ctn-rCt'n-r^ Xn-r{X'^Xr)~^ X'^^ 
and In-ri the U term is nonnegative. Thus, if we use the modified method 
suggested in Section 3 so that we have A = 1, the variance term (29) de- 
creases and the bias will be reduced to 

n n 

(32) Bias(Vjv/,n) = 2E E ^ii^u^'' 

i=l j=r+l 

which is always smaller than the original bias in (31) because X = {r — p — 
2)-i(r-p) > 1. 

A sufficient condition for the bias term in (32) to be zero is 

(33) = c[/„ - Xn{X[,Xn)-^Xl] 
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for some constant c > 0. To show this, let 5ij be the (i, j)th element of /„ 
Then 

n n 



ex' (z;x,)-ixj - ex' (x;x„)-^ 



J2^'^i 



i=l 



iX'^.Xr)-^Xj = 



and the bias term in (32) equals zero, which alternatively justifies our asser- 
tion of zero bias in Section 3. 

5. Simulation study. To see the effect of changing u = r — p into u = 
r — p + 2, we performed a limited simulation. The simulation study can be 
described as a 2 x 3 x 2 factorial design with L = 50,000 samples within each 
cell, where each sample is generated from 

(34) Yi = 2 + Axi + ei, 

where Xj = 5 + 10(n + l)~^i and Cj are independently and identically dis- 
tributed from the standard normal distribution. Thus, the population mean 
of {X,Y) is (10, 42). The factors are as follows: 

1. factor A, method of multiple imputation — SW method {v = r — 2), new 
method {v = r); 

2. factor B, response rate (r/n) — 0.8, 0.6, 0.4; 

3. factor C, sample size (n) — 20, 200. 

We used a uniform response mechanism and M = 5 repeated imputations. 

Table 1 presents the mean, the variance and the percentage relative ef- 
ficiency of the point estimators under the two imputation schemes. The 
percentage relative efficiency (PRE) is 

PRE= [Vari(^sw)]"^Vari(^„ew) x 100, 

where the subscript L denotes the distribution generated by the Monte Carlo 
simulation. Both imputation methods are unbiased for the two parameters 
and the Monte Carlo results are consistent with that property. The new 
procedure is slightly more efficient than the SW procedure and the efficiency 
is greater for lower response rates. 

Table 2 presents the relative bias and z-statistics for the variance es- 
timators. The relative bias of V as an estimator of the variance of 9 is 
calculated as [Vavi{9)]~^[EL{V) —YaiLi^)], and the z-statistic for testing 
Ho : E{V) = Var(^) is 

(35) .-staUsUc- L^'^IE.V) -Y.^m 



{El[V - El{V) + Vari(e) - (0 - £i(9))2|2}l/2 
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Table 1 

Mean, variance and the percentage relative ejficiency (PRE) of the multiple imputation 

point estimators under the two different imputation schemes (^50,000 samples) 





n 


r/n 


Mean 


Variance 


PRE 


Parameter 


SW 


New 


SW 


New 


(%) 


Mean 


20 


0.8 


42.0 


42.0 


0.066533 


0.066056 


99.2 






0.6 


42.0 


42.0 


0.095951 


0.094192 


98.2 






0.4 


42.0 


42.0 


0.150521 


0.142537 


94.7 




200 


0.8 


42.0 


42.0 


0.006594 


0.006583 


99.8 






0.6 


42.0 


42.0 


0.009069 


0.009057 


99.9 






0.4 


42.0 


42.0 


0.014143 


0.014090 


99.6 


Slope 


20 


0.8 


4.0 


4.0 


0.008873 


0.008785 


99.0 






0.6 


4.0 


4.0 


0.013148 


0.012956 


98.5 






0.4 


4.0 


4.0 


0.018190 


0.017443 


95.9 




200 


0.8 


4.0 


4.0 


0.000780 


0.000779 


99.9 






0.6 


4.0 


4.0 


0.001084 


0.001084 


100.0 






0.4 


4.0 


4.0 


0.001681 


0.001674 


99.6 



A heuristic argument for the justification of the z-statistic is made in Ap- 
pendix D. 

Under the SW imputation, the relative bias is larger for smaller samples 
and for smaller response rates. The new imputation produces much smaller 
relative bias for the variance estimator. For large sample sizes both imputa- 
tion methods produce negligible relative biases of the variance estimators. 



Table 2 

Relative bias (RB) and the z-statistic of Rubin's variance estimators under 

the two different imputation schemes (^50,000 samples) 





n 


r/n 


RB 




z-statistic 


Parcimeter 


SW 


New 


SW 


New 


Mean 


20 


0.8 


0.0624 


0.0008 


9.33 


0.12 






0.6 


0.1520 - 


-0.0045 


21.27 


-0.66 






0.4 


0.3221 


0.0074 


37.23 


0.97 




200 


0.8 


-0.0086 - 


-0.0124 


-1.36 


-1.94 






0.6 


0.0040 - 


-0.0045 


0.61 


-0.69 






0.4 


0.0155 - 


-0.0037 


2.29 


-0.55 


Slope 


20 


0.8 


0.0706 


0.0095 


10.39 


1.41 






0.6 


0.1560 - 


-0.0064 


21.21 


-0.90 






0.4 


0.3418 


0.0046 


37.81 


0.59 




200 


0.8 


0.0129 


0.0107 


2.03 


1.67 






0.6 


0.0175 


0.0031 


2.69 


0.48 






0.4 


0.0240 


0.0047 


3.60 


0.70 
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Table 3 

Mean length and the coverage of 95% confidence intervals under the 

two different imputation schemes (^50,000 samples) 





n 


r /n 


Mean 


length 


Coveri 

SW 


»ge (%) 


Parameter 


SW 


New 


New 


Mean 


20 


0.8 


1.1397 


1.0996 


95.4 


95.0 






0.6 


1.5305 


1.4032 


95.9 


94.7 






0.4 


2.2635 


1.9213 


96.4 


94.7 




200 


0.8 


0.3232 


0.3223 


95.0 


94.9 






0.6 


0.3949 


0.3928 


94.8 


94.7 






0.4 


0.5235 


0.5170 


94.8 


94.6 


Slope 


20 


0.8 


0.4169 


0.4020 


95.5 


95.0 






0.6 


0.5646 


0.5175 


95.8 


94.7 






0.4 


0.7897 


0.6696 


96.7 


94.9 




200 


0.8 


0.1124 


0.1121 


95.2 


95.1 






0.6 


0.1374 


0.1364 


95.0 


95.0 






0.4 


0.1811 


0.1789 


95.0 


94.7 



Table 3 displays the mean lengths and the coverages of 95% confidence in- 
tervals. The confidence intervals are {9 — tvV ,0 + t\V),^h.exet = to,Q2b,v and z^ 
is computed using the method of Barnard and Rubin (1999). The coverages 
of the confidence intervals are all close to the nominal level. For small sam- 
ple sizes the confidence intervals based on the new imputation are slightly 
narrower than the confidence intervals based on the SW method. Interval 
estimation shows less dramatic results for small sample size than variance 
estimation. This is partly because the distributions of point estimates are 
bell-shaped and partly because the degrees of freedom of Barnard and Rubin 
(1999) attenuate the effect of small sample bias of the variance estimator. 

6. Discussion. We study the mean and the covariance structure of the 
data set generated by the conventional multiple-imputation method under 
the regression model. Using the mean and the covariance structure of the 
multiply-imputed data set, we investigate the exact bias of Rubin's variance 
estimator. The bias of Rubin's variance estimator is negligible for large sam- 
ple sizes, as discussed by SW, but the bias may be sizable for small sample 
sizes. We propose a simple modified imputation method that is more effi- 
cient than the SW method and makes Rubin's variance estimator unbiased. 
When applied to a general class of linear estimators, the proposed method 
produces more efficient estimates and has smaller bias for variance estima- 
tion than that of the SW method. In a simulation study we found that the 
bias of Rubin's variance estimator under the SW method is remarkably large 
for small sample sizes. The bias of Rubin's variance estimator under the new 
method is reasonably small in the simulation. 
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In practice, the small sample bias of the multiple imputation variance 
estimator is of special concern when the scale parameter a is generated with 
small degrees of freedom. One such example is stratified sampling, where the 
sample selection is performed independently across the strata. In a strati- 
fied sample the assumption of equal a across the strata is not a reasonable 
assumption. Thus, the scale parameters have to be generated independently 
within each stratum, using only the respondents in the stratum, which often 
makes the degrees of freedom very small even for a large data set. The new 
method will significantly reduce the bias in this case. 

A commonly used imputation model is the cell mean model, where the 
study variables are assumed homogeneous within each cell. Under the cell 
mean model Rubin and Schenker (1986) considered various multiple impu- 
tation methods. Since the imputation is performed separately within each 
cell, the scale parameters are generated independently within each cell. Thus, 
the methods considered by Rubin and Schenker (1986) are subject to small 
sample biases. The biases can be significant when there are a small number 
of respondents within each cell. Recently, Kim (2002) proposed an alterna- 
tive imputation method of making the variance estimator unbiased under 
the cell mean model. 

APPENDIX A 

Proof of Lemma 2.1. 
By (3), 

(A.l) ii;(x;./3^,)|y,)=x;4. 

So 

Gov (y^, r;(*fc)) = Gov {y„ e{y;;;^-^ |y,)} = Gov(yi, x^.^,) = x^(x;x,)-ix,a^ 

Now, by (2), 

(A.2) E(ag) = E{a*^^\yr) = E{Xal) = Xa\ 

where \= {r — p — 2)~^(r — p). So 

Gov (xi/3^;^),x^./3(fc) ly,.) = Gov [-E(x-/3(fc) |y^, a*^^),E{x'jPl^^ lYr, o-(*fc))|y^] 

^k)^^'jP*{k)\yr,(^lk))\yr] 



+ E[CoY{^,(3r,.,^'.(3Uyr,ar,))\yr 



(A.3) 



E[4{X',Xrr^^jali^\yr] 



and, for kj^s, 

(A.4) Gov(x:/3^,),x;./3^,)|y,) = 0. 
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Hence, by (A.l) and (A.3), 

Cov(x:/3^,),x;./3^,)) = Cov[^(x:/3^,)|y,),ii;(x;./3^,)|y,)] 

+ i?[Cov(x^/3^,),x;./3^,)|y,)] 

(A.5) 

= Cov(x^/3,,x^/3^) + E{x.'i{Xl.Xr)-^XjajX} 

= {l + X)x'i{X'^Xry^Xja^ 
and, for A; 7^ s, by (A.l) and (A. 4), 

Cov(x^/3^,),x;./3^,)) = i?[Cov(x:/3^,),x;./3^,)|y,)] 

(A.6) + Gov [^(x:/3^fc) |y.) , Ei^'^P^-^ |y,)] 

= Cov(x^^^,x^-^J = ji.'i{X'X)-^^ja\ 
By (4) we have, for i^ j , 

(A. 7) Gov (y^*) , y;^:^ ) = Gov (x^/3^,) , x;./3^,) ) 

and 

(A.8) Gov(y4*),y*(*)) = i ,,^,i'^\^ f^ ., ., 

^ ' ^^' [Cov(xi/3(fc),x;-/3(fc)), if^T^J. 

Therefore, inserting (A.2), (A.5) and (A.6) into (A.7) and (A.8), result (12) 
fohows. D 

APPENDIX B 

Proof of Theorem 2.1. Before we calculate the variance of the mul- 
tiple imputation estimator we provide the following matrix identity. 

Lemma B.l. Let Xn he an nx p matrix of the form X^ = {X^,Xl^_j,), 
where X^ is an r x p matrix and Xn~r is an [n — r) x p matrix. Assume 
that X^Xn and X'^Xy. are nonsingular. Then 

. , (-'^r-'^r) = {^n^n) + (-'^n^n) Xj^_,fXn-r{XnXn) 

(B.l) 

+ i^n^n) X^_^Xn-r{Xj.Xr) X^_j.Xn~r{XnXn) 

Proof. Using the identity [e.g., Searle (1982), page 261] 
(B.2) {D - CA-^B)-^ = D-^ + D~^C{A - BD^^Cy^BD-^ 

with A = I, B = Xn-r, C = X'^_j. and D = X'^Xn, we have 

[d.o) 

X Xn-r{Xj^Xn)~ ■ 
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Using (B.2) again with A = X'^Xn, B = X'^_^, C = Xn-r and D = I, we 
have 

(B.4) [I — Xn-r{XnXn) X^_^] = I + Xn-r{X^Xr) X^_j,. 

Inserting (B.4) into (B.3), we have (B.l). D 

Note that ^M,n = ^"^ EfcLl^7(fc),n and the Pl(l),n,^I{2),n,- ■ ■ :Pl{M),n 

are identically distributed. Thus, 

(B.5) Var(^^,_ J = (1 - M~') Gov (^7(i),„ J,(2), J + M'^ Var (^,(1), J. 

Define a^ = (X;X„)~ixi and hij = x'^iX'^Xry^Xj . By (11) and (12) 

r r n 

Gov (^/(i),„,,9/(2),„) = ^aia'ia'^ + 2^1 Z! ^^ij^j^^"^ 

i=l i=l j=r+l 

(B.6) 

i=r+lj=r+l 

and 

r r n 

Var(/3j(i)_„)=^aia^cr^ + 2^ J2 ^ihija'jCr'^ 

i=l i=l j=r+l 

(B.7) 

+ (1 + A) ^ Y, SLihija'ja^ + X ^ aia-a^ 

i=r+lj=r+l i=r+l 

By the definition of aj and hij , we have 

r 

(B.8) 2^aja^ = (X^X„)~ X',Xr(X^X„)~ , 

i=l 

r n 

2_^ 2_^ aihija'j = (X'^Xn) X[^_^.Xn~r[X'n^n)~ 
4=1 j=r+l 



(B.9) 



and 



n n 

Pin 

■J 



i=r+l j=r+l 



n n 



7 , / , ^hijAj 

(B.IO) ,j=j.+lj=r+l 



[X^Xn) X^_^Xn-r{^r-^r) ^n-r-^n-r{X^Xn) 
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Hence, inserting (B.8)-(B.10) into (B.6) and (B.7), and applying Xl^Xr + 
X^_j.Xn-r = ^'n^n and (B.l), we have 

(B.ii) cov(^,(i),„ J,(2),j = (x;x,)-V2 

and 

(B.12) Var (Pj^,).,) = (X;,X,,)- V^ + X[iX',Xr)-' - (^XJ- V'- 
Thus, (15) is proved by (B.5). 

To show (17), because the l3ui)^rnPi(2).ni ■ ■ ■ i(^i(M),n are identically dis- 
tributed, 

(B.13) E{BM,n) = Var (^/(i),„) - Gov CPi(i).n^Pn2),n)- 

Thus, (17) is proved by inserting (B.ll) and (B.12) into (B.13). 

To show (16), we define Y^a = (y^,, yf^J to be the vector of the augmented 

data set at the A:th repeated imputation, where y^^ = (^r+i(fc)' ■^r*+2(fc)' ■ • ■ ' ■^nffc)-''' 
A: = 1,2,...,M. Then 

{n - P)^jik),n = Y'(fc) [In - X„(X;X„)-lX;] Y(fc) 

and, under the regression model (1), 

i?{Y|,)[/„-X„(X;X„)-ix;]Y(fc)} = trace{[/„-X„(X;X„)-ix;]y(Y'(,))}. 

By (11) and (12) we have 

T/ZV/ \—f Ir Xr{X^Xr)~ X^^_,,. \ o 

^\^(k))—\ V (YlY\-lVl (^^\\Y (YlY\~lYl _L \ T " ' 



'-n—r 



^^)> \Xn-r{X',Xrr^X', {I + \)Xn-r{X',Xrr^ X'^^, + XI„ 

Let P, = XniX'^XnY^X'^ and Q = V{Y[^^)a~^ - /„. Then 

E{{n-P)iy\k),n} = E{{I- Px){Q. + In)a^} 

(B.14) 

= trace {/„- P^jo- + trace{[/„ - Pi.](5}o- . 

By the classical regression theory trace{/n — Px} = n — p. For the second 
term note that the left-upper r "x r elements of Q are all zeros. Define C = 
(X;X„)-ix;_,X„_, and P) = (X;X,)-ix;_,X„_,. Then 

trace((5) = trace {(1 + X)Xn-r{X'^Xr)'^X'^_^ + (A - l)/„_r} 

(B.15) 

= (1 + A) trace(P>) + (A - l)(n - r) 

and 

trace(P,g) = 2trace{X„_,(X;X„)-ix;_,} 

+ (1 + A)trace{x„_,(x;x„)-ix;_,x„„,(x;x,)-ix;_j 

+ (A - 1) trace {X„_,«X„)-ix;_,} 
= (l + A)trace(C + CP>). 
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Note that, using X'^Xr + X'^_^Xn-r = ^'n^n-, we have 
trace(L» - CD) 

(B.17) = trace {[/ - {X'^XnY^ X'^_,Xn-r\{XlXr)-^ X'^_,Xn-r} 

= trace {(X^Xn)" X'^_j.Xn~r} = trace(C). 
Thus, by applying (B.17) to (B.15) and (B.16), we have 
(B.18) trace(Q - P^Q) = (A - l)(n - r). 

Therefore, inserting (B.18) into (B.14), we have (16). D 

APPENDIX C 

Proof of Theorem 4.1. The variance formula (29) directly follows by 
applying the multiple imputation variance formula (B.5) to OM,n and using 
aj = ttj in (B.6) and (B.7). 

To show (30), we decompose the total variance into three parts: 

(C.l) Var(^M,n) = Var(4) + Var(^M,n - On) + 2 Cov(^„, Om^u - On). 
To compute the bias of WM,n as an estimator of Var(0„), we first express it 
as Vn = Y^riY„, where Y^ = (y.^,yn_r) and ^ = [^ij\'-> then the within- 
imputation variance term can be written Wa/^^ = M~^X]fc=i Y/^nOY^^). 

Byi?(Y(fc))=i?(Y„), 

S(%)) = i5;(Y'(,))r!i?(Y(,)) +trace{OVar(Y(fe))} 

= £;(K )+ trace {f^[Var(Y(fc))-Var(Y„)]}. 

By the unbiasedness of Vn and by the covariance structures in (11) and (12) 
we have 

-B(T^M,n)-Var(0„) 

r n n n 

(C.2) =2E E ^ijhija^ + (a^ + \a^) Y. E ^iJ^^J 

i=lj=r+l i=r+lj=r+l 

n 

+{\-i)a' y: n,j. 

j=r+l 

For the BM,n term it can be shown that, using the same argument as for 
(B.6) and (B.7), 

E{BM,n) = Var (^/(i),„) - Gov {9i{i),Ji{2),n) 
(C.3) 

^ ' inn n \ 

= ( E E "i«i^ii+ E "N^^^- 

\j=r+lj=r'+l j=r+l / 



MULTIPLE IMPUTATION ESTIMATORS 17 

By the covariance structures in (11) 

Cr r n \ 

i=l 1=1 j=r+l / 

Thus, by (29), (C.4) and Var(^„) = ELi a?f^^ 

Var(4/,n - On) = Var(^M,n) + Var(^„) - 2 Cov(0M,n, k) 
(C-5) 

^ ' / n n n \ 

= (i + Af"U) E "?+ E E «*«j^ii k' 

\ i=r+l i=r+lj=r+l / 

and, by (C.3) and (C.5), 

^[(1 + M-^)BM,n] - yav{0M,n " k) 

(C.6) 

= (A-1) 



n n n 

E «?+ E E «i"i^ii 

i=r+l i=r'+ljf=r+l 



a2. 



For the covariance term in (C.l) note that 

Cov{9n,0M,n - On) 

(C.7) 

= Cov{6n, 0oo,n — ^n) + Cov(^„, 9M,n — OoD,n) = 0, 

because the first term on the right-hand side of the above equahty is zero 
by the congeniahty condition (28) and the second term is also zero because 

Cov(6'„, 9M,n — Ooo,n) = GoV {6 n , M ,n) — Cov(0.„, 0oo,n) = 0, 

by the fact that 0/(fc) , A; = 1, 2, ... , M, are identicahy distributed. Therefore, 
(30) follows from (C.2), (C.6) and (C.7). D 

APPENDIX D 

D.l. Justification for z-statistic in (35). Let {9i,Vi), i = l,2,...,L, be 
i.i.d. samples from a bivariate distribution G{6, V) with second moments. 
Then E{V) is unbiasedly estimated by El{V) = L~^J2iLiVi and Var((9) 
is unbiasedly estimated by (L - 1)-^L x ElUO - EL{9)f] = L~^Ef=i(^ - 
El{9)Y , where El{9) = L~^ J2i=i ^i- Thus, by the central limit theorem, 

p_^) ^ _ El{V) - El[{0 - El{0)?] - [E{V) - Var(g)] 

\^^{El{V) - El[{9 - El{0)Y]} 
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converges to a N{0, 1) distribution as L ^ oo. As El{V)- ElUO- EL{0)f] = 
L~^ J2i=i[^i ~ {^i ~ -^-l(^))^]i the variance term in the denominator of (D.l) 
is consistently estimated by 

L-^El{[V - {9 - EL{9)f -EL[V-{e- EL{e)f]f] 

= L~^El{[V -{9- EL{9)f - El{V) + VarL(^)]='}. 

Thus, using Slutsky's theorem, the z-statistic in (35) converges to a A^(0, 1) 
distribution under Hq : E(y) = Var(0) as L — > oo. 
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